Skip to content

[One Workflow](scale): Lazy-load workflow step I/O #253547

Merged
rosomri merged 24 commits intoelastic:mainfrom
rosomri:break_execution_api
Feb 19, 2026
Merged

[One Workflow](scale): Lazy-load workflow step I/O #253547
rosomri merged 24 commits intoelastic:mainfrom
rosomri:break_execution_api

Conversation

@rosomri
Copy link
Copy Markdown
Contributor

@rosomri rosomri commented Feb 17, 2026

Summary

Reduces memory pressure and network payload size by lazy-loading workflow step execution I/O data instead of fetching it all upfront.

Screen.Recording.2026-02-18.at.13.18.51.mov
  • Lazy-load step I/O: Execution polling (loadExecutionThunk) now requests lightweight data (includeInput=false, includeOutput=false). Full step input/output is fetched on demand — when the user clicks a step's tab or hovers a template expression in the YAML editor.
  • Server-side source filtering: getWorkflowExecution accepts includeInput/includeOutput query params and applies _source_excludes on Elasticsearch mget/search calls, avoiding large payloads that can cause OOM.
  • Bidirectional React Query cache: Step I/O fetched by the execution detail panel (useStepExecution) or by the YAML editor hover provider share a single cache via queryClient.setQueryData, preventing duplicate HTTP requests regardless of access order.
  • Cache cleanup on execution switch: Cached step data is cleared (removeQueries) when navigating to a different execution, preventing memory buildup.
  • Template hover priority: Reordered provideCustomHover so template expression hovers ({{ }}) take precedence over validation decoration tooltips.
  • Pure hover enrichment: Refactored ensureStepDatafetchStepDataIfNeeded to return enriched data instead of mutating the shared executionContext ref. Removed redundant fetchedStepIds tracking that caused a caching bug on repeated hovers.
  • Extracted useLazyStepExecutionFetcher hook: Moved inline fetch logic out of the YAML editor component into a dedicated hook for readability and testability.
  • Narrowed memo deps: tabs memo in WorkflowStepExecutionDetails now depends on hasInput/hasError booleans instead of the full stepExecution object.

Example flows

1. Execution polling — lightweight, no I/O

GET /api/workflowExecutions/exec-123?includeInput=false&includeOutput=false

Returns execution metadata and step statuses/durations, but input and output fields are excluded at the Elasticsearch _source level. This runs every poll cycle.

2. Hovering a template expression — lazy fetch + cache

User hovers {{ steps.search.output.hits }} in the YAML editor:

1. Hover provider calls fetchStepExecutionData("search")
2. Hook maps "search" → step doc ID "step-doc-abc"
3. React Query cache miss → GET /api/workflowExecutions/exec-123/steps/step-doc-abc
4. Response stored in cache: queryClient.setQueryData(["stepExecution", "exec-123", "step-doc-abc"], data)
5. Hover tooltip shows the resolved value

Second hover on the same step (or any steps.search.* expression):
1. fetchStepExecutionData("search") → cache hit → no HTTP request
2. Hover tooltip shows the resolved value immediately

For terminal steps, useStepExecution uses staleTime: Infinity — the cached data never goes stale for the lifetime of that execution.

3. Opening the I/O tab — served from cache

After the hover above already fetched step-doc-abc, user clicks the step and opens the Output tab:

1. useStepExecution("exec-123", "step-doc-abc", "completed") runs
2. React Query finds ["stepExecution", "exec-123", "step-doc-abc"] in cache
3. No HTTP request — data renders immediately

This works in both directions: if the user clicks the Output tab first, the hover provider finds the data in cache on subsequent hovers.

4. Switching execution — cache cleanup

1. User selects execution "exec-456"
2. useEffect cleanup fires: queryClient.removeQueries({ queryKey: ["stepExecution", "exec-123"] })
3. All cached step I/O for the previous execution is evicted
4. Fresh lightweight polling starts for "exec-456"

Test plan

  • get_workflow_execution.test.ts — Verifies _source_excludes is correctly passed to esClient.mget and searchStepExecutions based on includeInput/includeOutput flags
  • get_workflow_execution_by_id.test.ts — Updated existing route tests; added cases verifying query params are parsed and forwarded to the API layer
  • use_step_execution.test.ts — Verifies staleTime: Infinity and no polling for terminal steps; polling at 5s for running steps; polling stops on status transition
  • workflow_execution_detail.test.tsx — Verifies removeQueries is called on unmount and when executionId changes
  • unified_hover_provider.test.ts — Verifies hover values persist across multiple invocations, enrichment skipped when output already present, graceful fallback when fetch returns null
  • workflow_yaml_editor.test.tsx — Updated test wrapper to include QueryClientProvider for useLazyStepExecutionFetcher

@rosomri rosomri requested a review from a team as a code owner February 17, 2026 18:59
@rosomri rosomri added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:One Workflow Team label for One Workflow (Workflow automation) labels Feb 17, 2026
@rosomri rosomri marked this pull request as draft February 17, 2026 19:00
@rosomri rosomri marked this pull request as ready for review February 18, 2026 11:28
Comment on lines +66 to +67
includeInput = true,
includeOutput = true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When includeInput and includeOutput are omitted (both are optional), the API includes both by default.

From an API semantics perspective, that feels counterintuitive. Optional flags that default to “included” can be surprising, especially if they affect payload size or sensitive data exposure.

Would it make more sense to default both to false, and only include input/output when explicitly requested? That would make the API more explicit and predictable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - I initially set them to true by default for backward compatibility, but you’re right. I’ll switch them to false and share an update in the channel.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@semd done

@rosomri rosomri requested a review from semd February 19, 2026 09:18
@rosomri rosomri enabled auto-merge (squash) February 19, 2026 10:12
Copy link
Copy Markdown
Contributor

@semd semd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great job @rosomri 🎸
LGTM!

@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Feb 19, 2026

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] Jest Tests #3 / WorkflowsService getWorkflowExecution should return workflow execution with steps
  • [job] [logs] Jest Tests #3 / WorkflowsService getWorkflowExecution should return workflow execution with steps

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
workflowsManagement 1285 1287 +2

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
workflowsManagement 1.5MB 1.5MB +2.2KB

History

@rosomri rosomri merged commit 1094e4e into elastic:main Feb 19, 2026
16 checks passed
ersin-erdal pushed a commit to ersin-erdal/kibana that referenced this pull request Feb 19, 2026
## Summary

Reduces memory pressure and network payload size by lazy-loading
workflow step execution I/O data instead of fetching it all upfront.


https://github.com/user-attachments/assets/2d77d88d-4017-44bd-8581-082717352921

- **Lazy-load step I/O**: Execution polling (`loadExecutionThunk`) now
requests lightweight data (`includeInput=false`, `includeOutput=false`).
Full step input/output is fetched on demand — when the user clicks a
step's tab or hovers a template expression in the YAML editor.
- **Server-side source filtering**: `getWorkflowExecution` accepts
`includeInput`/`includeOutput` query params and applies
`_source_excludes` on Elasticsearch `mget`/`search` calls, avoiding
large payloads that can cause OOM.
- **Bidirectional React Query cache**: Step I/O fetched by the execution
detail panel (`useStepExecution`) or by the YAML editor hover provider
share a single cache via `queryClient.setQueryData`, preventing
duplicate HTTP requests regardless of access order.
- **Cache cleanup on execution switch**: Cached step data is cleared
(`removeQueries`) when navigating to a different execution, preventing
memory buildup.
- **Template hover priority**: Reordered `provideCustomHover` so
template expression hovers (`{{ }}`) take precedence over validation
decoration tooltips.
- **Pure hover enrichment**: Refactored `ensureStepData` →
`fetchStepDataIfNeeded` to return enriched data instead of mutating the
shared `executionContext` ref. Removed redundant `fetchedStepIds`
tracking that caused a caching bug on repeated hovers.
- **Extracted `useLazyStepExecutionFetcher` hook**: Moved inline fetch
logic out of the YAML editor component into a dedicated hook for
readability and testability.
- **Narrowed memo deps**: `tabs` memo in `WorkflowStepExecutionDetails`
now depends on `hasInput`/`hasError` booleans instead of the full
`stepExecution` object.

### Example flows

**1. Execution polling — lightweight, no I/O**

```
GET /api/workflowExecutions/exec-123?includeInput=false&includeOutput=false
```

Returns execution metadata and step statuses/durations, but `input` and
`output` fields are excluded at the Elasticsearch `_source` level. This
runs every poll cycle.

**2. Hovering a template expression — lazy fetch + cache**

User hovers `{{ steps.search.output.hits }}` in the YAML editor:

```
1. Hover provider calls fetchStepExecutionData("search")
2. Hook maps "search" → step doc ID "step-doc-abc"
3. React Query cache miss → GET /api/workflowExecutions/exec-123/steps/step-doc-abc
4. Response stored in cache: queryClient.setQueryData(["stepExecution", "exec-123", "step-doc-abc"], data)
5. Hover tooltip shows the resolved value

Second hover on the same step (or any steps.search.* expression):
1. fetchStepExecutionData("search") → cache hit → no HTTP request
2. Hover tooltip shows the resolved value immediately
```

For terminal steps, `useStepExecution` uses `staleTime: Infinity` — the
cached data never goes stale for the lifetime of that execution.

**3. Opening the I/O tab — served from cache**

After the hover above already fetched `step-doc-abc`, user clicks the
step and opens the Output tab:

```
1. useStepExecution("exec-123", "step-doc-abc", "completed") runs
2. React Query finds ["stepExecution", "exec-123", "step-doc-abc"] in cache
3. No HTTP request — data renders immediately
```

This works in both directions: if the user clicks the Output tab first,
the hover provider finds the data in cache on subsequent hovers.

**4. Switching execution — cache cleanup**

```
1. User selects execution "exec-456"
2. useEffect cleanup fires: queryClient.removeQueries({ queryKey: ["stepExecution", "exec-123"] })
3. All cached step I/O for the previous execution is evicted
4. Fresh lightweight polling starts for "exec-456"
```

## Test plan

- [x] `get_workflow_execution.test.ts` — Verifies `_source_excludes` is
correctly passed to `esClient.mget` and `searchStepExecutions` based on
`includeInput`/`includeOutput` flags
- [x] `get_workflow_execution_by_id.test.ts` — Updated existing route
tests; added cases verifying query params are parsed and forwarded to
the API layer
- [x] `use_step_execution.test.ts` — Verifies `staleTime: Infinity` and
no polling for terminal steps; polling at 5s for running steps; polling
stops on status transition
- [x] `workflow_execution_detail.test.tsx` — Verifies `removeQueries` is
called on unmount and when `executionId` changes
- [x] `unified_hover_provider.test.ts` — Verifies hover values persist
across multiple invocations, enrichment skipped when output already
present, graceful fallback when fetch returns null
- [x] `workflow_yaml_editor.test.tsx` — Updated test wrapper to include
`QueryClientProvider` for `useLazyStepExecutionFetcher`

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
rosomri added a commit that referenced this pull request Feb 20, 2026
…load I/O change (#254087)

## Summary

Fixes workflow execution output retrieval for agent-builder consumers.

After [#253547](#253547) changed
`getWorkflowExecution` to exclude step I/O by default,
`getExecutionState` was no longer receiving step outputs - causing
`getWorkflowOutput` to always return `null`.

This passes `includeOutput: true` explicitly so the output is available
when the execution completes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:One Workflow Team label for One Workflow (Workflow automation) v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants